Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up _is_eol_token #1256

Closed

Conversation

correctmost
Copy link
Contributor

This change provides a small speed-up on large codebases (~200ms).

Stats

Before

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.940    0.000    1.245    0.000 pycodestyle.py:1831(_is_eol_token)
  1472360    0.359    0.000    0.359    0.000 {method 'lstrip' of 'str' objects}
Command Mean [s] Min [s] Max [s] Relative
pycodestyle . 18.349 ± 0.161 18.119 18.641 1.00

After

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.458    0.000    0.459    0.000 pycodestyle.py:1831(_is_eol_token)
   225341    0.057    0.000    0.057    0.000 {method 'lstrip' of 'str' objects}
Command Mean [s] Min [s] Max [s] Relative
pycodestyle . 18.145 ± 0.098 17.997 18.283 1.00

Set-up

I profiled pycodestyle with the yt-dlp codebase because it is similar in composition to a private codebase I have.

git clone https://github.com/yt-dlp/yt-dlp.git
cd yt-dlp

git checkout ef36d517f9b05785d61abca7691d9ab7d63cc75c

# Callgraph command
python -m cProfile -o stats $(which pycodestyle)

# Benchmarking command
hyperfine --ignore-failure --warmup 2 --runs 15 --export-markdown=baseline.md 'pycodestyle .'

setup.cfg

[pycodestyle]
ignore = W504
max-line-length = 100

Baseline version: c8e36a0


# Check if the line's penultimate character is a continuation
# character
if token[4][-2] != '\\':
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on local testing with Python 3.12.4, I had assumed that the string length would always be >=2 here.

It looks like that is not the case with all supported versions of Python.

@correctmost
Copy link
Contributor Author

Closing because this approach doesn't work for all Python versions.

@correctmost
Copy link
Contributor Author

I updated the patch and re-ran the benchmarks.

The additional tokenize.ENDMARKER check seems to have reduced the savings from ~200ms to ~100ms.

If this PR seems too risky because of assumptions about tokenization, feel free to pass on it :).

Stats

Before

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.885    0.000    1.197    0.000 pycodestyle.py:1831(_is_eol_token)
  1472360    0.364    0.000    0.364    0.000 {method 'lstrip' of 'str' objects}
Command Mean [s] Min [s] Max [s] Relative
pycodestyle . 18.472 ± 0.196 18.067 18.848 1.00

After

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1478156    0.511    0.000    0.511    0.000 pycodestyle.py:1831(_is_eol_token)
   225341    0.055    0.000    0.055    0.000 {method 'lstrip' of 'str' objects}
Command Mean [s] Min [s] Max [s] Relative
pycodestyle . 18.360 ± 0.186 18.159 18.781 1.00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant